Mining Contrast Inequalities in Numeric Dataset

نویسندگان

  • Lei Duan
  • Jie Zuo
  • Tianqing Zhang
  • Jing Peng
  • Jie Gong
چکیده

Finding relational expressions which exist frequently in one class of data while not in the other class of data is an interesting work. In this paper, a relational expression of this kind is defined as a contrast inequality. Gene Expression Programming (GEP) is powerful to discover relations from data and express them in mathematical level. Hence, it is desirable to apply GEP to such mining task. The main contributions of this paper include: (1) introducing the concept of contrast inequality mining, (2) designing a two-genome chromosome structure to guarantee that each individual in GEP is a valid inequality, (3) proposing a new genetic mutation to improve the efficiency of evolving contrast inequalities, (4) presenting a GEP-based method to discover contrast inequalities, (5) giving an extensive performance study on real-world datasets. The experimental results show that the proposed methods are effective. Contrast inequalities with high discriminative power are discovered from the real-world datasets. Some potential works on contrast inequality mining are discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach

Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes. However, datasets with mixed types of attributes are common in real life data mining applications. In this paper, we propose a novel divide-and-conquer techniq...

متن کامل

An Empirical Study of Similarity Search in Stock Data

Using certain artificial intelligence techniques, stock data mining has given encouraging results in both trend analysis and similarity search. However, representing stock data effectively is a key issue in ensuring the success of a data mining process. In this paper, we aim to compare the performance of numeric and symbolic data representation of a stock dataset in terms of similarity search. ...

متن کامل

Mining Frequent Ranges of Numeric Attributes via Ant Colony Optimization for Continuous Domains without Specifying Minimum Support

Currently, all search algorithms which use discretization of numeric attributes for numeric association rule mining, work in the way that the original distribution of the numeric attributes will be lost. This issue leads to loss of information, so that the association rules which are generated through this process are not precise and accurate. Based on this fact, algorithms which can natively h...

متن کامل

Numeric Multi-Objective Rule Mining Using Simulated Annealing Algorithm

Abstract as a single objective one. Measures like support, confidence and other interestingness criteria which are used for evaluating a rule, can be thought of as different objectives of association rule mining problem. Support count is the number of records, which satisfies all the conditions that exist in the rule. This objective represents the accuracy of the rules extracted from the da...

متن کامل

Mining Oblique Data with XCS

The classiier system XCS was investigated for data mining applications where the dataset discrimination surface (DS) is generally oblique to the attribute axes. Despite the classiiers' hyper-rectangular predicates, XCS reached 100% performance on synthetic problems with diagonal DS's and, in a train/test experiment, competitive performance on the Wisconsin Breast Cancer dataset. Final classiier...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010